Description

Background and Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help the bank improve its services so that customers do not renounce their credit cards

You need to identify the best possible model that will give the required performance

Objective

  1. Explore and visualize the dataset.
  2. Build a classification model to predict if the customer is going to churn or not
  3. Optimize the model using appropriate techniques

  4. Generate a set of insights and recommendations that will help the bank

Data Description

Data Dictionary

Let's start coding!

Imblearn installation

Observations

Observations

Observations

Let's encode the categorical variable with relevent level.

Exploratory Data Analysis

Univariate Analysis

Histogram

Observations

Observations

Observations

Observations

Observations

Observations

Observations

Observations

Observations

Observations

Barplot

Observations

Observations

Observations

Observations

Observations

Bivariate Analysis

Observations

Stacked Bar Chart

Observations

Observations

Observations

Observations

Observations

Observations

Observations

Observations

Split the dataset into train and test sets

Let's encode missing values

Observation:

Model Building:

Decision Tree Classifier

Observations:

Cost Complexity Pruning

Observation:

Random Forest Classifier

Observation:

Bagging Classifier

Observation:

AdaBoost Classifier

Observation:

Gradient Boosting Classifier

Observation:

Logistic Regression

Observation:

Oversampling and Undersampling the train data

Oversampling train data using SMOTE

Let's train a decision tree classifier using the oversampled data

Let's check the perfomace using the confusion matrix

Observations:

Logistic Regression

Observations:

Decision Tree Classifier

Observations:

Random Forest Classifier

Observations:

Bagging Classifier

Observations:

Adaboost Classifier

Observations:

Gradient Boost Classifier

Observations:

Undersampling train data using Random Undersampler

Let's train a decision tree classifier using the undersampled data

Observations:

Random Forest Classifier

Observations:

Bagging Classifier

Observations:

Adaboost Classifier

Observations:

Gradient Boost Classifier

Observations:

Logistic Regression

Observations:

Select three models to be tuned using Randomized Search CV

Randomized Search CV

Random Forest Classifier Tuned

Choose a best model and predict the performance on the test set

Observations:

adaboost tuned

Observations:

Gradient Boosting Tuned

Observations:

Training Performance Comparison

Let's create a sound pipeline for further clarification about the overall performance.

Pipeline

make_pipeline

Business Insights